Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs) avatar

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

Under maintenance

Pricing

Pay per usage

Go to Apify Store
Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

Under maintenance

Given a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Hojun Lee

Hojun Lee

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Sitemap URL Discovery

Given a domain, finds sitemap.xml + sitemap_index.xml (also via robots.txt), recursively expands nested sitemaps, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.01 site fee + $0.0001/URL.


Why this exists

Before you scrape, audit, or index a site, you need to know what's there. The site's own sitemap is the authoritative list — but discovering it requires:

  1. Checking common paths (sitemap.xml, sitemap_index.xml, wp-sitemap.xml)
  2. Parsing robots.txt for Sitemap: directives
  3. Recursively walking sitemap-index → child sitemaps
  4. Parsing each one for <url> records

This actor does all of it with sane fallbacks. Returns a summary + one row per discovered URL.


What you get

Summary row

{
"_type": "summary",
"site_url": "https://www.apify.com",
"sitemaps_scanned": 5,
"sitemap_urls": [
"https://www.apify.com/sitemap.xml",
"https://www.apify.com/sitemap-index.xml",
"https://www.apify.com/sitemap/actors1.xml",
...
],
"urls_discovered": 12384
}

Per-URL row

{
"_type": "url",
"url": "https://www.apify.com/store/actors/...",
"lastmod": "2026-06-08",
"changefreq": "weekly",
"priority": "0.7"
}

Quick start

Discover all URLs on a domain

{
"siteUrl": "https://www.apify.com"
}

Only product / actor pages

{
"siteUrl": "https://www.apify.com",
"pathContains": "/store/actors/",
"maxUrls": 5000
}

Cap scan size for huge sites

{
"siteUrl": "https://en.wikipedia.org",
"maxUrls": 100000,
"maxSitemapFiles": 50
}

Pricing

Pay-Per-Event:

  • $0.01 — flat fee per site (covers initial discovery)
  • $0.0001 — per URL row returned
RunURLsCost
Small SaaS site200$0.03
Mid-sized blog5,000$0.51
Mega site100,000$10.01

Vs Screaming Frog SEO Spider ($259/yr), Sitebulb ($175/yr) for one-off audits.


Use cases

  1. SEO audit — Pull every URL with its lastmod; find stale content
  2. Crawl planning — Feed URLs into Web → Markdown or your own scraper
  3. Content monitoring — Detect new URLs by comparing snapshots over time
  4. Competitor research — See what a competitor's catalog looks like
  5. Sitemap sanity check — Verify sitemap-index works; catch broken nested sitemaps

Limitations

  • No HTML scraping fallback — If a site has no sitemap (rare for serious sites), this returns 0 URLs. For HTML-link-crawling, use a crawl-specific actor.
  • Doesn't honor noindex — A URL in sitemap might still be noindex in HTML; this actor returns what's in sitemap.


Feedback

A short review helps SEO engineers find it: Leave a review on Apify Store